Search CORE

2 research outputs found

Non-acted multi-view audio-visual dyadic interactions. Project non-verbal emotion recognition in dyadic scenarios and speaker segmentation

Author: Lázaro Herrasti Pablo
Publication venue
Publication date: 02/09/2019
Field of study

Treballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona, Any: 2019, Tutor: Sergio Escalera Guerrero i Cristina Palmero[en] In particular, this Master Thesis is focused on the development of baseline Emotion Recognition System in a dyadic environment using raw and handcraft audio features and cropped faces from the videos. This system is analyzed at frame and utterance level without temporal information. As well, a baseline Speaker Segmenta- tion System has been developed to facilitate the annotation task. For this reason, an exhaustive study of the state-of-the-art on emotion recognition and speaker segmentation techniques has been conducted, paying particular attention on Deep Learning techniques for emotion recognition and clustering for speaker aegmentation. While studying the state-of-the-art from the theoretical point of view, a dataset consisting of videos of sessions of dyadic interactions between individuals in different scenarios has been recorded. Different attributes were captured and labelled from these videos: body pose, hand pose, emotion, age, gender, etc. Once the ar- chitectures for emotion recognition have been trained with other dataset, a proof of concept is done with this new database in order to extract conclusions. In addition, this database can help future systems to achieve better results. A large number of experiments with audio and video are performed to create the emotion recognition system. The IEMOCAP database is used to perform the training and evaluation experiments of the emotion recognition system. Once the audio and video are trained separately with two different architectures, a fusion of both methods is done. In this work, the importance of preprocessing data (face detection, windows analysis length, handcrafted features, etc.) and choosing the correct parameters for the architectures (network depth, fusion, etc.) has been demonstrated and studied. On the other hand, the experiments for the speaker segmentation system are performed with a piece of audio from IEMOCAP database. In this work, the prerprocessing steps, the problems of an unsupervised system such as clustering and the feature representation are studied and discussed. Finally, the conclusions drawn throughout this work are exposed, as well as the possible lines of future work including new systems for emotion recognition and the experiments with the database recorded in this work

Diposit Digital de la Universitat de Barcelona

Recopilación y uso de datos masivos en sistemas de verificación de firma manuscrita estática

Author: Lázaro Herrasti Pablo
Publication venue
Publication date: 01/06/2018
Field of study

En este trabajo se estudia, implementa y evalúa un sistema de reconocimiento biométrico de firma estática basado en Redes Neuronales Convolucionales. Por esta razón, al inicio de este trabajo se ha realizado un estudio de las diferentes técnicas que han ido marcando el estado del arte, haciendo especial hincapié en las Redes Neuronales Convolucionales. Una vez entendido el estado del arte desde el punto de vista teórico, se ha creado una base de datos global formada por otras más pequeñas con el objetivo de crear una gran base de datos que integre distintos escenarios, dispositivos y útiles de escritura para poder analizar en un futuro los problemas que surgen al haber interoperabilidad de dispositivos y múltiples útiles de escritura. En primer lugar se crea la base de datos online y más tarde a partir de la información online se obtiene la base de datos offline. Se han obtenido diferentes versiones de la base de datos, teniendo en cuenta distinta información como la información de vuelo (penups) o la presión. Además, se sigue una nomenclatura clara y concisa que permitirá su uso en el futuro y su fácil accesibilidad. Después de crear la base de datos, se procede a crear la arquitectura del sistema de reconocimiento de firma que se usará para los distintos experimentos. Se utilizan otras dos bases de datos distintas a la creada en este trabajo para realizar los experimentos de entrenamiento y evaluación del sistema. Por otro lado, tradicionalmente se tienen en cuenta otras técnicas distintas a las CNNs y firma offline real para realizar el reconocimiento de firma. En este trabajo, dependiendo de los resultados obtenidos se han ido realizando cambios tanto en la arquitectura de la CNN como en los parámetros de la firma offline sintética (penups, presión, etc). Además, se ha demostrado la importancia que tienen algunos parámetros y como cambian las características extraídas por las capas convolucionales en función de ellos. Finalmente, se exponen las conclusiones extraídas a lo largo de este trabajo, así como las posibles líneas de trabajo futuro en las que se encuentra mejorar los resultados para la base de datos creada y estudiar el problema de interoperabilidad que presenta

Biblos-e Archivo